Search CORE

199 research outputs found

Maximizing Neutrality in News Ordering

Author: Advani Rishi
Asudeh Abolfazl
Papotti Paolo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2023
Field of study

The detection of fake news has received increasing attention over the past few years, but there are more subtle ways of deceiving one's audience. In addition to the content of news stories, their presentation can also be made misleading or biased. In this work, we study the impact of the ordering of news stories on audience perception. We introduce the problems of detecting cherry-picked news orderings and maximizing neutrality in news orderings. We prove hardness results and present several algorithms for approximately solving these problems. Furthermore, we provide extensive experimental results and present evidence of potential cherry-picking in the real world.Comment: 14 pages, 13 figures, accepted to KDD '2

arXiv.org e-Print Archive

Querying Large Language Models with SQL

Author: De Cao Nicola
Papotti Paolo
Saeed Mohammed
Publication venue
Publication date: 25/10/2023
Field of study

In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we envision the use of SQL queries to cover a broad range of data that is not captured by traditional databases by tapping the information in LLMs. To ground this vision, we present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM. The main idea is to execute some operators of the the query plan with prompts that retrieve data from the LLM. For a large class of SQL queries, querying LLMs returns well structured relations, with encouraging qualitative results. Preliminary experimental results make pre-trained LLMs a promising addition to the field of database systems, introducing a new direction for hybrid query processing. However, we pinpoint several research challenges that must be addressed to build a DBMS that exploits LLMs. While some of these challenges necessitate integrating concepts from the NLP literature, others offer novel research avenues for the DB community.Comment: Accepted for presentation at EDBT 2024 as Vision pape

arXiv.org e-Print Archive

Pythia: Unsupervised generation of ambiguous textual claims from relational data

Author: Badaro Gilbert
Papotti Paolo
Saeed Mohammed
Santoro Donatello
Veltri Enzo
Publication venue: country:USA
Publication date: 01/01/2022
Field of study

Applications such as computational fact checking and data-to-text generation exploit the relationship between relational data and natural language text. Despite promising results in these areas, state of the art solutions simply fail in managing “data-ambiguity”, i.e., the case when there are multiple interpretations of the relationship between the textual sentence and the relational data. To tackle this problem, we introduce Pythia, a system that, given a relational table D, generates textual sentences that contain factual ambiguities w.r.t. the data in D. Such sentences can then be used to train target applications in handling data-ambiguity. In this demonstration, we first show how our system generates data ambiguous sentences for a given table in an unsupervised fashion by data profiling and query generation. We then demonstrate how two existing applications benefit from Pythia’s generated sentences, improving the state-of-the-art results. The audience will interact with Pythia by changing input parameters in an interactive fashion, including the upload of their own dataset to see what data ambiguous sentences are generated for it

Archivio della Ricerca - Università della Basilicata

Ambiguity Detection and Textual Claims Generation from Relational Data

Author: Donatello Santoro
Enzo Veltri
Gilbert Badaro
Mohammed Saeed
Paolo Papotti
Publication venue: CEUR-WS
Publication date: 01/01/2022
Field of study

Archivio della Ricerca - Università della Basilicata

Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Author: Bortoli Stefano
Grossi Margherita
Kanagawa Motonobu
Mitsuzawa Kensuke
Papotti Paolo
Publication venue
Publication date: 02/11/2023
Field of study

Two-sample testing decides whether two datasets are generated from the same distribution. This paper studies variable selection for two-sample testing, the task being to identify the variables (or dimensions) responsible for the discrepancies between the two distributions. This task is relevant to many problems of pattern analysis and machine learning, such as dataset shift adaptation, causal inference and model validation. Our approach is based on a two-sample test based on the Maximum Mean Discrepancy (MMD). We optimise the Automatic Relevance Detection (ARD) weights defined for individual variables to maximise the power of the MMD-based test. For this optimisation, we introduce sparse regularisation and propose two methods for dealing with the issue of selecting an appropriate regularisation parameter. One method determines the regularisation parameter in a data-driven way, and the other aggregates the results of different regularisation parameters. We confirm the validity of the proposed methods by systematic comparisons with baseline methods, and demonstrate their usefulness in exploratory analysis of high-dimensional traffic simulation data. Preliminary theoretical analyses are also provided, including a rigorous definition of variable selection for two-sample testing

arXiv.org e-Print Archive

Transformers for Tabular Data Representation: A Survey of Models and Applications

Author: Gilbert Badaro
Mohammed Saeed
Paolo Papotti
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2023
Field of study

AbstractIn the last few years, the natural language processing community has witnessed advances in neural representations of free texts with transformer-based language models (LMs). Given the importance of knowledge available in tabular data, recent research efforts extend LMs by developing neural representations for structured data. In this article, we present a survey that analyzes these efforts. We first abstract the different systems according to a traditional machine learning pipeline in terms of training data, input representation, model training, and supported downstream tasks. For each aspect, we characterize and compare the proposed solutions. Finally, we discuss future work directions

Directory of Open Access Journals

Exploring Task-agnostic, ShapeNet-based Object Recognition for Mobile Robots

Author: Bardaro Gianluca
Bastianelli Emanuele
Chiatti Agnese
Mitra Prasenjit
Motta Enrico
Papotti Paolo
Tiddi Ilaria
Publication venue
Publication date: 01/01/2019
Field of study

This position paper presents an attempt to improve the scalability of existing object recognition methods, which largely rely on supervision and imply a huge availability of manually-labelled data points. Moreover, in the context of mobile robotics, data sets and experimental settings are often handcrafted based on the specific task the object recognition is aimed at, e.g. object grasping. In this work, we argue instead that publicly available open data such as ShapeNet can be used for object classification first, and then to link objects to their related concepts, leading to task-agnostic knowledge acquisition practices. To this aim, we evaluated five pipelines for object recognition, where target classes were all entities collected from ShapeNet and matching was based on: (i) shape-only features, (ii) RGB histogram comparison, (iii) a combination of shape and colour matching, (iv) image feature descriptors, and (v) inexact, normalised cross-correlation, resembling the Deep, Siamese-like NN architecture of Submariam et al. (2016). We discussed the relative impact of shape-derived and colour-derived features, as well as suitability of all tested solutions for future application to real-life use cases

Heriot Watt Pure

VU Research Portal

Open Research Online (The Open University)